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Abstract 

In  decision-making  problems,  the  Bayesian  action  and  its  posterior  expected  loss  usually' 
can  not  be  obtained  analytically,  in  this  paper  we  study  a  Monte  Carlo  method  for  approximat¬ 
ing  the  Bayesian  action  and  its  posterior  expected  loss.  The  Monte  Carlo  approximation  to  the 
Bayesian  action  is  obtained  through ’(IT  approximating  the  posterior  expected  loss  function  by 
using  the  Monte  Carlo  integration  method  andK2>  searching  the  minimum  of  the  approximated 
posterior  expected  loss  function.  As  the  Monte  Carlo  sample  size  diverges  to  infinity,  the 
Monte  Carlo  approximations  are  shown  to  be  convergent  in  very  general  situations.  The  rates 
of  the  convergence  are  also  obtained  under  some  regularity  conditions  on  the  loss  function. 
Two  accuracy  measures  of  the  Monte  Carlo  approximations  are  proposed.  Some  examples  are 
presented  for  illustration.  '  <'y  : 

Key  words:  Bayesian  action;  posterior  expected  loss;  Monte  Carlo  integration;  Almost  sure 
convergence;  Accuracy  measures. 
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1.  INTRODUCTION 


Many  problems  in  Bayesian  analysis  can  only  be  solved  numerically.  In  this  article,  we 
study  the  Monte  Carlo  method  for  determining  the  Bayesian  action  in  a  decision-making  prob¬ 
lem.  Let  X  be  a  random  n  -vector  whose  distribution  has  a  density  /  {x  1 0)  (with  respect  to  a 
measure  v),  where  0e  0  is  an  unknown  parameter  and  0  is  a  subset  of  Rp .  For  a  particular 
problem  (e.g.,  the  estimation  of  0),  a  decision  is  to  be  made  after  observing  the  data  X-  x. 
These  decisions  are  commonly  referred  to  as  actions  in  the  literature.  We  will  denote  a  partic¬ 
ular  action  by  a  and  the  set  of  all  possible  actions  under  consideration  by  Cl .  Let  L (0,  a)  be 
the  loss  incurred  when  the  action  a  is  taken  and  0  turns  out  to  be  the  true  parameter.  L  (0,  a ) 
is  assumed  to  be  a  nonnegative  function  defined  on  0x<2  (measurable  in  0  for  each  a ).  Sup¬ 
pose  that  7t(0)  is  the  density  (with  respect  to  a  measure  jj.)  of  a  prior  distribution  on  0.  The 
posterior  expected  loss  of  an  action  a ,  given  the  data  x ,  is  defined  to  be 

r(a)  =  /e^(0.  a)p(Q\x)d\i,  (1.1) 

where 

p  (0 1  x )  =  /  (x  1 0)7t(0)/M  (x ) 

is  the  posterior  of  0  and 

M  Oc )  =  J_  /  (*  1 9)rc(0)d  It.  (1.2) 

A  Bayesian  optimal  action  is  then  an  action  a *  which  minimizes  r(a),  or  equivalently,  an 
action  a  *  which  minimizes 

P (fl)  =  |@L (0,  a)f  {x  1 0)tt(0)d p..  (1.3) 

If  the  problem  under  consideration  does  not  involve  a  conjugate  prior-likelihood  pair  or 
the  loss  function  is  not  of  a  special  form  (e.g.,  the  squared  error  loss),  this  minimization  prob¬ 
lem  can  not  be  solved  explicitly.  That  is,  a*  does  not  have  a  closed  form  since  the  integrals 
(1.1 )-( 1 .3)  usually  do  not  have  explicit  forms.  Consequently,  numerical  methods  for  calculat¬ 
ing  the  integrals  (1.1)-(1.3)  and  the  Bayesian  action  a *  are  required.  In  particular,  one  useful 
method  for  calculating  the  posterior  moments  j  g(0)p(0ix)dp.  is  the  Monte  Carlo  method, 
which  was  studied  in  Kloek  and  van  Dijk  (1978),  Stewart  (1979,  1983)  and  Geweke  (1986, 
1988a,b).  A  Monte  Carlo  method  for  calculating  the  Bayesian  action  is  described  as  follows: 


I 


Step  1.  Choose  a  probability  density  h(B)  (with  respect  to  |i)  such  that  the  support  of  h(Q) 
includes  that  of  7t(0).  Let  Ph  be  the  probability  distribution  corresponding  to  h  (0).  Generate 
m  independent  and  identically  distributed  (i.i.d.)  random  vectors  0lt  02,...,  0m  from  Ph. 

Step  2.  Approximate  p(a )  for  any  a  e  Q,  by 

pm{a,  to)  =  m'lYT=  iL(9t*  a)w(0t),  (1.4) 

where  co  denotes  a  particular  sequence  {  0(-  } /I  j  and 

w(0)=/(jc(0)jc(0)/7t(0).  (1.5) 

Step  3,  Find  an  am( co)e(2  which  minimizes  p m(a,  co),  i.e., 

pm(co)  =  pm(am(co),  co)  =  min/-  pm(a,  co):  aefl  }.  (1.6) 

am=am(co)  is  the  Monte  Carlo  approximation  to  the  Bayesian  action  a*.  For  the  compu¬ 
tation  of  am( co),  working  with  p(a)  is  preferable  to  working  with  r(a )  since  it  avoids  the  com¬ 
putation  of  m(x).  A  Monte  Carlo  approximation  to  r(am  (co)),  the  posterior  expected  loss  of 
om(co),  is  rm(co)  =  pm(co)/Mm(co),  where  pOT(co)  is  defined  in  (1.6)  and 
m m (co)  =  rm(£°)  can  also  be  used  to  estimate  the  minimum  posterior  expected 

loss  r{a*). 

The  function  h  in  Step  1  is  called  the  importance  function.  It  should  be  chosen  so  that 
the  random  generation  process  in  Step  1  can  be  easily  carried  out.  The  accuracy  of  the  Monte 
Carlo  approximations  depends  on  the  choice  of  h.  Generally  speaking,  h  should  mimic  the 
posterior  p(Q\x)  and  satisfy  several  technical  conditions  such  as  B3,  (2.8)  and  (3.1)  in  Sections 
2  and  3.  Extensive  discussions  on  how  to  choose  this  function  can  be  found  in  Berger  (1985, 
Section  4.9)  and  Geweke  (1988b).  In  Step  2,  we  have  reduced  computation  by  using  the  same 
random  sequence  {  0,-  }  flx  for  computing  p m(a,  co)  in  (1.4)  for  all  aed .  Not  only  does  this 
method  allow  us  to  use  a  large  m,  but  it  also  results  in  certain  desirable  properties  for  am, 
which  will  be  discussed  in  the  next  section.  In  Step  3,  one  may  employ  any  available  algo¬ 
rithm  for  solving  a  minimization  problem. 

Throughout  the  paper,  we  assume  that  the  integrals  (1.2)-(1.3)  are  finite  and  the  likelihood 
/  (jc  1 0)  and  the  prior  7t(0)  can  be  evaluated  easily  when  x  and  0  are  known.  No  other  assump¬ 
tion  on  /  (x  1 0)  and  7t(0)  is  made.  Also,  there  is  no  restriction  on  the  length  of  the  vector  x , 
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i.e.,  the  total  number  of  observations.  If  rt(0)  is  a  hierarchical  prior  of  the  form 

TC(0)  =  I  X)7C2(X.)£f  T, 

where  x  is  a  measure  on  AcRL  then  Jt(0)  is  not  easy  to  evaluate  (for  a  given  0)  and  the  gen¬ 
eration  of  0,’s  from  7t(0)  may  also  be  difficult  However,  we  may  draw  m  i.i.d.  random 
ip  +s  )-vectors  ^=(9; ,  from  the  distribution  with  density  h  (£)  (with  respect  to  jixr  on 
0xA),  and  compute 

pm(a,  co)  =  a)f  (x  1 0£ )rc1(0i  lX()7t2(^)//i(^). 

The  rest  of  this  paper  is  organized  as  follows.  In  Section  2,  convergence  of  the  Monte 
Carlo  approximations  am  (co)  and  rm  (co)  is  studied.  The  rates  of  convergence  are  also  obtained. 
In  Section  3,  we  propose  two  measures  which  can  be  used  to  indicate  the  accuracy  of  the 
Monte  Carlo  approximations.  Section  4  contains  an  example  of  application. 

2.  LIMITING  BEHAVIOR  OF  THE  MONTE  CARLO  APPROXIMATIONS 
2.1.  Convergence 

Here,  an  immediate  question  is  whether  or  not  am  converges  to  a*  as  m  diverges  to 
infinity.  Since  am=am( co)  is  a  function  of  random  quantity  co,  we  say  that  am  converges  to  a* 
almost  surely  ( a.s . )  if  for  almost  all  co  (with  respect  to  Ph),  am(co)-»a*  as  m 

To  provide  an  answer  to  the  above  question,  let  us  first  consider  the  simple  case  of  finite 
action  problems,  i.e.,  Cl  consists  of  finitely  many  actions  a(1),  a(2) . a^k\  The  most  impor¬ 

tant  examples  are  hypothesis  testing  and  classification  problems.  See  also  Example  2  in  Sec¬ 
tion  4.  Without  loss  of  generality,  assume  that  a(t)  are  Bayesian  actions,  where  i  is  an 

integer  ^  k.  Let  £=2~1mini</Sjkf  p(a(,))-p(a^)  }.  Since  k  is  fixed,  it  follows  from  the  strong 
law  of  large  numbers  (SLLN)  that  for  almost  all  co,  there  is  an  integer  such  that  when 
m>mc B, 

p co)  <  p(a(/))  +  e  =  p(a(1))  +  e  <  p(a(/))  -  e  <  p m(a(l\  co) 

holds  for  any  j<>i  and  l>i.  This  proves  that  am{sa)=a^)  for  a  j <i ,  i.e.,  am(co)  is  a  Bayesian 
action  for  m>m(Sj.  From  the  SLLN,  Mm(co)  m(x)  a.s.  Thus,  both  r(am (co))  and  rm( co)  con¬ 
verge  to  r(a(1))  a.s.  If,  in  addition,  the  Bayesian  action  is  unique  0=1),  then 
am(co) o(1)  a.s. 
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The  above  results  can  be  extended  to  the  case  where  G  is  a  compact  subset  of  Rq .  We 
state  the  following  result  Its  proof  uses  a  similar  argument  to  the  above  discussion  and  a  uni¬ 
form  SLLN  and  is  given  in  the  Appendix. 


Theorem  1.  Suppose  that  G  is  a  compact  subset  of  Rq .  Let 

GB  -  {  a  :  a  is  a  Bayesian  action  }  (2.1) 


and  for  a  fixed  0), 

A&  =  {  a:  a  is  a  limit  point  of  {  am( co)  }. 

If  L(0,  a)  is  continuous  in  a  for  each  06  0  and  there  is  a  measurable  function  M(Q)  such  that 

sup{  L(0,  a):  aeG  }  <M(0)  and  J  M (0)/ (x  1 0)rc(0)d p  <  «>, 

© 

then  there  exists  a  Bayesian  action  and  the  following  results  hold: 

for  almost  all  co,  (2.2) 


and 


r(am(co))  ->  r(a*)  a.s.  and  rm(co)  -4  r{a*)  a.s.  (2.3) 

If  the  Bayesian  action  is  unique,  then  am( CO)  a*  a.s. 


When  a*  is  not  unique,  am( co)  may  not  converge  since  {  am (to) }  “=1  may  have  several 
limit  points.  However,  am  (to)  can  still  be  used  in  practice  as  a  good  decision  since  the  poste¬ 
rior  expected  loss  of  am((o)  converges  to  the  minimum  posterior  expected  loss.  In  addition, 
am(co)  is  close  to  a  Bayesian  action  in  the  sense  of  (2.2).  Regardless  of  uniqueness  of  a*, 
rm(( o)  is  close  to  r(am(c o))  (or  r(a*))  according  to  (2.3). 

Often  <2  is  a  convex  subset  of  R?  (e.g.,  Cl  =0  for  an  estimation  problem)  and  Cl  is  not 
compact.  In  this  case,  let  G'  be  the  closure  of  G  and  assume  that  the  loss  function  L  (0,  a )  is 
defined  on  ®xG'.  Furthermore,  let  G  be  an  unbounded  subset  of  R? .  (If  G  is  bounded,  then 
G'  is  compact  and  Theorem  1  applies.)  To  establish  the  convergence  of  am  in  this  case,  we 
need  some  additional  regularity  conditions.  In  the  following,  B(c)  and  N(c)  denote  the  sets 
{  aeG':  II  a  -a*  II  =  c  }  and  {  aeG':  II  a -a*  II  ^  c  }  for  a  positive  constant  c,  where  II  II  is 
the  Euclidean  norm  on  R*7 . 
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Condition  A. 


Al.  d  is  convex  and  L(0,  a)  is  a  convex  continuous  function  of  a  for  each  0e  0. 

A2.  r(a)<<»  for  all  aed  and  there  exists  an  a  *  e  <2  such  that 

r(a*)  =  minf  r(a):aed  }. 

A3.  There  is  a  nonempty  set  B(c )  such  that 

inf  {  r(a ):  aeB(c)  }  >  r(a*). 

A4.  There  is  a  measurable  function  M  (0)  such  that 

sup  {  L  (0,  a ):  a  e  N(c )  }  <  M  (0)  and  J  M  (0)/  (x  1 0)rc(0)d |i  <  «>. 

0 

Examples  of  convex  loss  functions  which  are  commonly  used  in  the  estimation  of  scalar  0 
are  the  squared  error  loss  (0-a  )2  and  the  absolute  error  loss  I  Q-a  I .  See  also  Example  2  in 
Section  4.  For  vector  0=(  0^\...,0^  )x  and  a=(  )T,  a  commonly  used  loss  is  the 

weighted  average  of  the  loss  functions  L,  (0(l \  which  satisfies  Al  if  each  Li  does. 

Under  Al,  r(a)  is  convex.  Condition  A2  simply  says  that  a  Bayesian  action  for  the  prob¬ 
lem  under  consideration  exists.  Let  UB  be  defined  as  in  (2.1)  and 

Cl'B={  aed':  r(a)=r(a *)  J. 

Then  dBczd'B  and  dB=  dB  iff  r(a)>r(a*)  for  all  a  in  the  boundary  of  d  but  not  in  d  . 
Condition  A3  rules  out  the  possibility  that  r(a)  is  too  flat  in  the  sense  that  dB  is  unbounded. 
Condition  A3  is  also  necessary  for  dB  to  be  bounded.  A  sufficient  condition  for  A3  is  that 
L(0,  a)  is  strictly  convex,  which  also  ensures  the  uniqueness  of  the  Bayesian  action.  Condi¬ 
tion  A4  is  mild  since  N(c)  is  compact  and  L(0,  a)  is  continuous  in  a. 

Theorem  2.  Under  Condition  A,  the  assertions  of  Theorem  1  hold  (with  dB  replaced  by  d'B ). 

The  proof  of  Theorem  2  is  given  in  the  Appendix.  The  convergence  of  am  (oo)  in  the 
situation  where  the  loss  function  is  non-convex  is  studied  in  Shao  (1988). 

2.2.  Convergence  Rates  and  Asymptotic  Distributions 

The  convergence  rates  and  asymptoiic  distributions  of  am (co)  and  rm( co)  are  obtained 
under  some  further  regularity  conditions  on  the  loss  function.  The  variances  of  the  asymptotic 
distributions  can  be  used  as  accuracy  measures  of  am(co)  and  rm{ co)  (see  Section  3).  Let 
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Z/(0,  a)  and  L"(0,  a)  be  the  gradient  3 L(0,  a)/3a  and  the  Hessian  mat  rrL  (0,  a)/dadaz, 
respectively. 

Condition  B. 

Bl.  A1  and  A2  hold. 

B2.  For  each  0,  L(0,  a)  is  a  twice  continuously  differentiable  function  of  a  in  a 
neighborhood  N(c )  of  a  * . 

B3.  J  II 1/(0,  fl*)ll2w'2(0)A(0)d)j.  <  oo,  where  w(0)  is  defined  in  (1.5). 

B4.  There  is  a  measurable  function  v(0)  such  that  for  any  (k,  /), 

supf  IL&(9,  a) I:  aeS(c)  }  <>  v(0)  and  I  v(Q)f(x  I0)rt(0 )dn  <  «>, 

where  L#(0,  a)  is  the  ( k ,  /)th  element  of  L"(0,  a). 

B5.  L"(Q,  a*)  is  positive  definite  for  each  0. 

Condition  B5  implies  that  Jl"(0,  a*)f  (x  I0)7t(0)dp  is  positive  definite.  Under  Condition 
B,  A3  and  A4  are  satisfied  and  a*  is  unique.  Hence  am(co)  converges  to  a*  a.s. 

Theorem  3.  Under  Condition  B,  we  have 

am((0)~  a*  =  0(m~'h(loglogm)'12)  a.s. ,  (2.4) 

r(am((o))  -  r(a*)  =  0(m~hoglogm)  a.s.  (2.5) 

and 

(o)  -  a*)  ->d  N( 0,  E),  (2.6) 

where  — >d  denotes  convergence  in  distribution  and 

E  =  E^E^1  (2.7) 

with 

E!  =  Je[Z/(0,  fl*)][L'(0,  O]Tw2(0)/i(e)dn 

and 

E2  =  /eE"(0,a*)/(^l0)7t(0)d^. 
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If 


then 


and 


c2  =  f  [L(d,  a*)  -  Ha*)]2p2(Q\x)/h(Q)dii  < 

(2.8) 

rm  (co)  -  r  (a  * )  =  0  (m~'/2(loglog  m  )'h)  a.s. , 

(2.9) 

rm( co)  -  r{am (to))  =  0 (m ~'h (loglog mjh)  a.s.. 

(2.10) 

rn,/l(rm(o»-r(a*))  ->d  N(fl,  o2) 

(2.11) 

m'h(rm(c o)  -  r(am(ca)))  ->d  N( 0,  o}). 

(2.12) 

The  proof  is  given  in  the  Appendix. 

An  important  special  case  is  the  problem  of  estimating  9  with  loss  L(9,  a)=ll9-a  II2, 
9e  0cRp ,  a ed  <zRp .  The  Bayesian  action  is  the  posterior  mean  and 

aw(C0)"  m -l££1/ C*  1 6» )/ h (9,- )  ' 

A  straightforward  calculation  shows  that 

L  =  J0(9-a*  )(9-a*  )V2(0 \x)lh  (9)dp. 

and 

a2  =  J  [  II  © — a  *  II2  -  r(a*)]2p2(9lx)/7i(6)rffi. 

© 

When  /i(9)  is  the  posterior  density  p{ 9 lx),  £  reduces  to  the  posterior  covariance  matrix.  The 
posterior  expected  loss  of  am  (to)  in  this  case  is 

r (am (to))  =  r(a*)  +  llam((o)-a*  II2 
=  r  (a*)  +  O  (m~x  loglog  m )  as. 

Conditions  B3  and  (2.8)  suggest  that  the  importance  function  h(Q)  should  be  selected  so 
that  the  tails  of  /i  (0)  are  heavier  than  the  tails  of  the  posterior  p  (01  x).  Note  that  the  posterior 
mean  and  variance  of  L(Q,  a*)  are  r(a*)  and  cr2  =  o 2  with  A(9)=p(9l;r),  respectively.  From 
Theorem  3,  the  standard  error  due  to  Monte  Carlo  approximation  is  the  fraction  (ym)-  /j  of  o. 
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where  y=c1/a^  is  called  the  relative  numerical  efficiency  (Geweke,  1988b).  This  may  lead  to  a 
guide  to  choosing  the  importance  function  h(Q)  and  the  Monte  Carlo  approximation  sample 
size  m.  See  the  discussions  in  Geweke  (1988a,b). 

3.  MEASURES  OF  ACCURACY 

In  pr?c»’ce,  it  is  also  desirable  to  indicate  the  precisions  of  am(co)  and  rm( to)  (as  approxi¬ 
mations  to  the  Bayesian  action  and  posterior  expected  loss  of  am( co),  respectively)  by  using 
some  measures  of  accuracy.  A  nice  feature  of  the  Monte  Carlo  method  is  that  accuracy  esti¬ 
mates  of  and  rm( co)  can  be  computed.  With  a  method  of  estimating  the  precision  of 

am{(o),  the  Monte  Carlo  sample  size  m  can  be  chosen  by  using  an  iterative  method.  That  is, 
starting  with  an  initial  m0,  we  compute  am.  and  pm.  (an  estimate  of  the  precision  of  am)  for 
mj=mQ+jt,  where  t  is  a  step  size  and  j  =0,1,2,...  Stop  if  pm,  is  less  than  a  predescribed  small 
number.  This  method  is  used  in  Example  2  of  Section  4. 

We  discuss  two  accuracy  measures  of  the  Monte  Carlo  approximations  and  study  them  in 
an  illustrative  example. 

3.1.  Asymptotic  Variance  Approach 

The  variances  of  the  asymptotic  distributions  of  am( o>)  and  rm(co)  can  be  used  as  accuracy 
measures.  Since  in  general,  Z  (2.7)  and  c2  (2.8)  do  not  have  closed  forms,  they  have  to  be 
approximated  by  the  Monte  Carlo  approximations 

K  = 

and 

vm  =  am (to))  -  rm(o))]2w2(9()/[Mm(co)]2) 

respectively,  where 

Vj  =  (0,,  am(G)))][Z/(e,,  am(co))]Tw2(ei) 

and 

v2  =  am  (co))w  (0, ). 

The  estimated  asymptotic  variances  V^/m  and  Vrmlm  can  then  be  used  as  accuracy  measures 
of  flm(to)  and  rm  (to),  respectively.  The  following  theorem  proves  the  almost  sure  convergence 
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of  these  approximations. 


Theorem  4.  Assume  the  conditions  in  Theorem  3.  If,  in  addition,  there  is  a  measurable  func¬ 
tion  u  (0)  such  that 

supf  IIZ/(0,  a) II2:  aeN(c)  }  <  m(0)  and  f  u(Q)w2(Q)h(Q)d\i  <  «>,  (3.1) 

then 

V*  -» 1  as.  and  Vrm  — >  cr2  a.s. 

Proof.  The  results  follow  from  the  convergence  of  am  and  Theorem  2  of  Jennrich  (1969).  D 
3.2.  Independent  Samples 

In  some  situations  (e.g.,  <2  is  a  finite  set  or  L(0,  a)  is  not  differentiable),  the  asymptotic 
variance  approach  does  not  apply.  Another  method  for  obtaining  accuracy  measures  of  am  (to) 
and  rm( co)  is  as  follows:  we  repeat  the  calculation  of  the  Bayesian  action  (with  independent 
sets  of  random  0’s)  and  compute  the  sample  variance.  Suppose  that  m=gk  and  we  generate 
(from  Ph)  g  independent  sets  of  random  variables  co;=(01;,...,0^),  y'=l,...,g.  Calculate  the 
ak((L>j),  Pk(Mj),  rk((Oj)  according  to  (1.6),  j=l,...,g,  and  the  sample  variances 

Sfr  =  (co;  )~am  )(ak  ((Oj  yam  )\  am  =  ©y)  (3.2) 

and 

S'„  =  rm  =  g-'Zf.MUj). 

Sgk/g  and  Sgk/g  are  then  used  as  accuracy  measures  of  am  (co)  and  rm  (co),  respectively. 

The  motivation  of  the  use  of  independent  samples  for  assessing  the  accuracy  of  am{ co) 
and  rm(a> )  are  intuitively  obvious.  Unlike  the  asymptotic  variance  approach,  this  method  does 
not  require  conditions  B  and  (3.1).  But  it  may  require  more  computations  if  the  computation  of 
am  (co)  involves  a  difficult  minimization  problem,  since  it  needs  to  solve  the  same  minimization 
problem  g  times. 

Usually  k  should  be  chosen  to  be  large  and  g  may  be  chosen  to  be  small  or  moderate  (for 
computational  saving).  For  example,  if  m=10,000,  we  may  choose  g=10  (£=1,000).  Unless 
both  g  and  k  are  large,  the  accuracy  estimates  obtained  by  using  asymptotic  variance  approach 
and  independent  samples  may  be  different  (see  Example  1). 
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When  independent  samples  are  used,  to  reduce  the  computation,  we  may  use  am  defined 
in  (3.2)  as  the  Monte  Carlo  approximation  to  a* .  This  is  specially  preferred  if  the  computation 
of  am  is  expensive.  Although  am  and  am  are  generally  different,  the  difference  is  usually  inap¬ 
preciable  for  large  k . 

3.3.  An  Example 

We  study  the  accuracy  estimates  V°  and  Sgk  in  an  illustrative  example  where  the  exact 
values  of  a*  and  the  asymptotic  variance  Z  can  be  obtained  and  therefore  we  can  assess  the 
Monte  Carlo  approximations. 

Example  1.  (Berger,  1985,  Section  4.3).  Let  X  ~iV(0, 1),  where  0  is  a  measure  of  some 
positive  quantity.  In  this  case,  it  is  difficult  to  estimate  0  by  using  the  classical  approach.  For 
example,  the  maximum  likelihood  estimate  of  0  is  max(  x  ,  0  ),  which  is  unsuitable  if  the 
observation  x<  0.  Under  the  noninformative  prior  7t(0)=  /(O  oo)(0),  the  posterior  mean  of  0  is 

[ix=x  +<j>(x)/0(x), 

where  0(x)  is  the  standard  normal  distribution  function  and  4>(jc  )=0'(jc ).  In  this  case,  no 
Monte  Carlo  approximation  for  |ix  is  needed.  To  see  the  accuracy  of  the  Monte  Carlo  approxi¬ 
mations,  ten  independent  sets  (each  of  size  m=5,000)  of  random  variables  are  generated  from 
the  density  h{B)=e~^I  ^  ^(0)  and  the  Monte  Carlo  approximations  am  and  vm  =  (V“/m)'/z  are 
calculated  and  reported  in  Table  1  for  *=1.5,  0.5  and  -0.5.  The  exact  values  of  the  posterior 
mean  (j.z  and  the  asymptotic  standard  deviation  a  =  (Urn)'11  are  included.  Both  am  and  vm  are 
quite  accurate. 

For  comparison,  we  also  include  sm  =  (S°t)1/l  (g=10,  *=5,000)  in  Table  1.  In  all  three 
cases,  sm  is  quite  different  from  vm  (or  a),  even  though  k=m= 5,000.  This  is  not  surprise  since 
g=10  is  small. 

3.4.  Concluding  Remarks 

(1)  7/hen  the  loss  function  is  differentiable,  accuracy  estimates  based  on  asymptotic  variances 
are  recommended.  The  importance  function  h(B)  should  be  chosen  so  that  conditions  (2.8)  and 
(3.1)  are  satisfied. 
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(2)  When  the  asymptotic  variance  approach  can  not  be  applied,  use  independent  samples  to 
estimate  the  accuracy  of  the  Monte  Carlo  approximations. 

(3)  In  some  situations,  it  may  be  appropriate  to  focus  on  relative  accuracy  measures  such  as 
v®/l am  I  and  s“/\am  I,  where  v*=(V“/m)xl1  and  s°=(Sgic/g)'/'1  are  asymptotic  standard  devia¬ 
tions. 


4.  NUMERICAL  IMPLEMENTATION 

In  some  situations,  the  minimization  problem  in  Step  3  can  be  solved  analytically,  e.g., 
a  *  is  a  posterior  moment  or  quantile.  However,  there  are  situations  where  am  (to)  can  only  be 
obtained  numerically.  There  is  a  large  body  of  literature  on  optimization  and  nonlinear  pro¬ 
gramming  which  provides  many  efficient  algorithms  for  solving  minf  pm(a,  to):  aed  }.  See 
Avriel  (1976)  and  Rao  (1984),  which  also  provide  many  other  references. 

As  we  discussed  in  Section  1,  the  same  random  {  }  should  be  used  for  approximating 

p(a ).  If  storage  is  not  a  problem,  we  may  reduce  computations  by  storing  {  w  (0, )  }  and  com¬ 
puting  pm (a,  to)  =  m~l'EJ*lL(Qi,  a )w (0, )  for  all  needed  a,  where  w(0)  is  defined  in  (1.5). 

In  the  following  we  use  an  example  to  illustrate  the  numerical  implementation  of  the 
Monte  Carlo  method  in  practical  problems. 


Example  2.  A  company  has  developed  a  new  type  of  product  and  must  decide  the  number  of 
the  new  products  (denoted  by  a )  to  produce  based  on  a  marketing  survey.  Let  0  be  the  unk¬ 
nown  proportion  of  customers  who  will  favor  the  new  product. 


(1)  Likelihood  function.  A  sample  of  36  customers  was  interviewed  and  11  of  them  indicated 
that  they  favor  the  new  product.  Hence  the  likelihood  function  is  (f^)0!1(l — O)25. 

(2)  Loss  function.  Using  the  method  introduced  in  Becker,  DeGroot  and  Marschak  (1964), 


DeGroot  (1970,  Chapter  7)  and  Berger  (1985,  Chapter  2),  we  obtain  the  company’s  loss  func¬ 
tion 


L(0,  a)  = 


-2,500,000ca  +  l  a  <  c  0/2 

lO,562,5OO0-1(8a  /13-c  0/2)2  -  l,64O,625c20  +  /  cQ/2<a  <cQ 
1,500,000c (a -2c 0)  +  /  a  £  c0 
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where  c=2xl05  and /=1,640, 625c2.  Here,  0e0=(O,l)  and  aed  ={  all  the  integers  <  c  } . 

(3)  Prior  distribution.  It  is  felt  that  0.2<8<0.5  is  twice  as  likely  as  0<O.2  or  0>O.5  and  there  is 
no  other  information  about  8  available.  Thus,  we  use  the  vague  prior 

(2/9)/(0.2.oj)(e)  +  (1/21)[1— /  (O.2,O.5)(0)1- 

Neither  the  posterior  expected  loss  nor  its  Monte  Carlo  approximation  has  a  closed  form 
in  this  problem.  Note  that  CL  contains  finitely  many  actions.  However,  the  number  of  actions 
is  so  large  that  it  is  too  expensive  to  evaluate  pm  ( a ,  to)  for  all  actions.  Since  L  (0,  a )  is  a  con¬ 
vex  function  of  a ,  we  used  the  golden  section  method  (see  Avriel,  1976)  with  24  evaluations 
of  pm  ( a ,  co)  to  find  the  Monte  Carlo  approximation  to  the  Bayesian  action  a  * . 

To  assess  the  accuracy  of  the  Monte  Carlo  approximations,  we  used  independent  samples 
(Section  3.2)  since  the  asymptotic  variance  approach  is  not  applicable  to  this  case.  The  follow¬ 
ing  iterative  method  was  used  for  selecting  a  Monte  Carlo  sample  size:  Compute  am.  and  rm* 
for  mj  =  ( k0+jt)g ,  7=0, 1,2,...,  with  step  size  r= 1,000,  g=10  and  initial  ko=l,000  (m0= 10,000). 
At  each  iteration,  the  relative  accuracy  measures  c“  =  s“lcim  and  crm  =  srmlrm  were  computed, 
where  s°  =  ( S'k/g  )'/j  and  s„  =  (S£k/g  )'h.  Stop  if  both  and  crm,  are  less  than  0.001. 

The  importance  function  was  chosen  to  be  proportional  to  the  likelihood  function.  The 
random  0,  ’s  were  generated  by  using  the  IMSL  subroutine  GGBTR.  The  computation  was 
done  (after  four  iterations)  on  a  VAX  11/780  (Unix  4.3BSD)  at  Purdue  University.  The  total 
CPU  time  used  is  about  1.96  minutes. 

The  selected  Monte  Carlo  sample  size  is  m  =40,000  and  the  resulting  Monte  Carlo  approx¬ 
imations  to  a*  and  r(a*)  are  51,223  and  459,861.09,  respectively.  Table  2  shows  the  accuracy 
measures  and  the  results  from  four  iterations. 

APPENDIX 

Proof  of  Theorem  I.  The  existence  of  a  Bayesian  action  is  guaranteed  by  the  continuity  of 
r(a).  From  Theorem  2  of  Jennrich  (1969),  for  almost  every  o), 

p m(a,  co)  p(a)  uniformly  in  aeCl . 

Suppose  that  a' e  A  m.  Then  by  compactness  of  d  ,  a'  ed  .  Consequently,  there  is  a  sequence 
{  ntj  }Jl i  such  that  flm  (co)  -4  a'  and  pm.(amj((iD),  co)  — »  p(fl').  Since  for  a*edB , 
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Pm,- (%(©).  2  Pm/a  * )  ->  p(a  * ),  p(a'  )^p(a  * )  and  therefore  a  edB.  Thus,  4  fflc  dB . 

If  a*  is  the  unique  Bayesian  action,  then  (to)  — >  a* ,  p(am(o))  — >  p(a* )  and 
pm(<2m(co),  co)  — >  p(a*).  Hence  (2.2)  and  (2.3)  hold  since  Mm— »m(jc)  a.s. 

In  the  case  where  a*  is  not  unique,  let  p'  be  a  limit  point  of  {  pm(am(co),  to)  _/ ~=1.  Then 
there  is  a  sequence  {  rrij  }J°=l  such  that  pm  (am  (co),  co)  -»  p'.  From  the  compactness  of  CL , 
there  is  a  subsequence  {  mt  j/“1<=/'  ntj  }JLX  such  that  ami( co)  ->  a' .  By  the  above  proof, 
P m,(aOT,(co),  co)  -»  p(a')  =  p (a*).  Thus,  p'=p(a*)  and 

rOT(co)  =  pm(am  (co),  co)  -4  p(a*). 

Similarly,  we  can  show  that  p(am(co))  — >  p(a*)  and  therefore  (2.2)  and  (2.3)  hold.  n 

Proof  of  Theorem  2.  From  the  proof  of  Theorem  1,  the  result  follows  if  for  almost  all  co, 
there  exists  an  such  that 

am  (co)€  N(c )  for  all  m  £  (Al) 

Let  e  be  such  that  e>0  and  inf  {  p (a):  aeB(c) }  >  p(a*)  +  2e.  From  Condition  A  and 
Theorem  2  of  Jennrich  (1969),  for  almost  all  co,  there  is  an  such  that  when  m>mm, 

I pm {a ,  co)  p(a ) I  <e  for  all  aeN(c).  (A2) 

Let  bm  (co)  be  such  that 

Pm  ( bm  (co),  co)  =  minf  pm  (a ,  co):  a  e  N(c )  } .  (A3) 

For  a  fixed  suppose  that  llam(co)-a*  II  >c.  Then  there  exist  an  a^(co)eB(c)  and  a  X 

such  that  0<X.<1  and  a^(oi)=Xbm (co)+(l-^.)am (co).  By  the  convexity  of  L (8,  a). 

Pm «(®).  ©)  £  Xpm(6m(co),  co)  +  (l-X)pm(am(co),  co) 

£  pm(hm(co),  co)  £  pm(a^(co),  co), 

i.e.,  pm  (bm  (co),  co)  =  pm  (a^(co),  co).  However,  from  (A2), 

pm(a*,  co)  <  p(a*)  +  e  <  inf  {  p (a):  aeB(c) }  -  e 
^  P(0m«o))  -  e  <  pm(a*(co),  co)  =  pm  (bm  (co),  co), 
which  is  contrary  to  (A3).  Thus,  (Al)  holds. n 
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Proof  of  Theorem  3.  By  Theorem  2,  we  may  focus  on  the  event  {  am(co)eN(c) }.  Under 


B2-B4,  p'(a)=f  Z/(9,  a)f  (x  I0)tc(0)dp.  is  continuous  on  N(c)  and 

p'(a * )  =  Jem  a* )f  (x  \ B)n(&)d p  =  0.  (A4) 

Note  that  (9; ,  am  (co))w(9i  )=0  under  B2.  By  the  mean  value  theorem  and  (A4), 

a  *  )w  (0, )  =  aml)w(0,)](flm(o))-a* ),  (A5) 

p(am(co))  -  p(a*)  =  ta«(co)-a’)T[p"(am2)](am(o))-a*)  (A6) 

and 

rm(o>)  -  pm (a * ,03)  =  am3)]T(am (©)-«*),  (A7) 

where  amj,  j- 1,2,3,  are  on  the  line  segment  between  a*  and  am (at).  From  B4  and  Theorem  2, 

m"1^! L"(0,,  aml)w(0,)  -»  L2  a.s.  (A8) 


Hence  we  obtain  (2.4)  by  applying  the  law  of  iterated  logarithm  to  each  component  of  the  left 
hand  side  of  (A5).  Also,  (2.5)  follows  from  (2.5),  (A6),  the  continuity  of  p "(a)  and 
Mm— »m(x)  a.s.  From  the  central  limit  theorem,  (2.6)  follows  from  (A5),  (A8)  and  Theorem  2. 

If  Oy  is  finite,  by  the  law  of  iterated  logarithm, 

rm  (a  *  ,©)  -  r  (a  * )  =  O  (m-1/i(loglog  m  )'h )  a.s.  (A9) 

where  rm(a* ,(o )  =  pm(a*,0))/Mm(to).  From  B2-B4,  (A4)  and  Theorem  2  of  Jennrich  (1969), 

m-1X£iL'(0i’  °m3)  ->  0  a.s.  (A10) 

Hence  (2.9)  follows  from  (2.4),  (A7),  (A9)  and  (A  10),  and  (2.10)  follows  from  (2.5)  and  (2.9). 
From  the  central  limit  theorem  and  Mm  — »  m(x)  a.s. , 

m 1/2 [rm ( a *,(o)-r(a * )}  ->d  N (0,  Or2).  (All) 

Then  (2.11)  follows  from  (A7),  (A10),  (All)  and  (2.6).  Finally,  (2.12)  follows  from  (2.5)  and 

(2.11). D 
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Table  1 

Monte  Carlo  approximations  to  and  a  (Example  1) 
(  h  (0)  -  e~^ l (o  „)(0),  m  =5,000  ) 


x=  1.5 

t  =  0.5 

X 

=  -0.5 

Hx 

=  1.6388 

=  1.0091 

M*  = 

=  0.6412 

a  = 

0.015230 

a  = 

0.009418 

0  = 

0.006580 

a.  v« 

dm  V„ 

dm 

mmm 

1.6540 

0.015389 

1.0063 

0.009581 

0.6310 

0.006559 

1.6307 

0.014763 

1.0271 

0.009275 

0.6602 

0.006682 

1.6149 

0.015108 

1.0033 

0.009359 

0.6409 

0.006526 

1.6339 

0.015107 

1.0118 

0.009407 

0.6443 

0.006576 

1.6325 

0.015632 

0.9986 

0.009285 

0.6421 

0.006471 

1.6573 

0.015331 

1.0114 

0.009530 

0.6378 

0.006605 

1.6915 

0.015439 

1.0219 

0.009605 

0.6382 

0.006653 

1.6595 

0.015293 

1.0173 

0.009413 

0.6483 

0.006598 

1.6186 

0.015091 

1.0070 

0.009333 

0.6430 

0.006584 

1.6555 

0.015415 

1.0054 

0.009512 

0.6330 

0.006595 

1.6448 

0.015257 

1.0110 

0.009430 

0.6419 

0.006585 

HSfiSEZCSSZflH 

sm  =  0.007819 

Table  2 

Results  from  four  iterations  (Example  2) 
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